skip to main content


Search for: All records

Creators/Authors contains: "Yang, Hanmei"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. The NUMA architecture accommodates the hardware trend of an increasing number of CPU cores. It requires the coop- eration of memory allocators to achieve good performance for multithreaded applications. Unfortunately, existing allo- cators do not support NUMA architecture well. This paper presents a novel memory allocator – NUMAlloc , that is de- signed for the NUMA architecture. NUMAlloc is centered on a binding-based memory management. On top of it, NUMAl- loc proposes an “origin-aware memory management” to ensure the locality of memory allocations and deallocations, as well as a method called “incremental sharing” to balance the performance benefits and memory overhead of using transparent huge pages. According to our extensive evalua- tion, NUMAlloc hasthebestperformanceamongallevaluated allocators, running 15.7% faster than the second-best allo- cator (mimalloc), and 20.9% faster than the default Linux allocator with reasonable memory overhead. NUMAlloc is also scalable to 128 threads and is ready for deployment. 
    more » « less
    Free, publicly-accessible full text available June 18, 2024
  2. Deadlocks are notorious bugs in multithreaded programs, causing serious reliability issues. However, they are difficult to be fully expunged before deployment, as their appearances typically depend on specific inputs and thread schedules, which require the assistance of dynamic tools. However, existing deadlock detection tools mainly focus on locks, but cannot detect deadlocks related to condition variables. This paper presents a novel approach to fill this gap. It extends the classic lock dependency to generalized dependency by abstracting the signal for the condition variable as a special resource so that communication deadlocks can be modeled as hold-and-wait cycles as well. It further designs multiple practical mechanisms to record and analyze generalized dependencies. In the end, this paper presents the implementation of the tool, called UnHang. Experimental results on real applications show that UnHang is able to find all known deadlocks and uncover two new deadlocks. Overall, UnHang only imposes around 3% performance overhead and 8% memory overhead, making it a practical tool for the deployment environment. 
    more » « less
  3. The cache plays a key role in determining the performance of applications, no matter for sequential or concurrent programs on homogeneous and heterogeneous architecture. Fixing cache misses requires to understand the origin and the type of cache misses. However, this remains to be an unresolved issue even after decades of research. This paper proposes a unified profiling tool--CachePerf--that could correctly identify different types of cache misses, differentiate allocator-induced issues from those of applications, and exclude minor issues without much performance impact. The core idea behind CachePerf is a hybrid sampling scheme: it employs the PMU-based coarse-grained sampling to select very few susceptible instructions (with frequent cache misses) and then employs the breakpoint-based fine-grained sampling to collect the memory access pattern of these instructions. Based on our evaluation, CachePerf only imposes 14% performance overhead and 19% memory overhead (for applications with large footprints), while identifying the types of cache misses correctly. CachePerf detected 9 previous-unknown bugs. Fixing the reported bugs achieves from 3% to 3788% performance speedup. CachePerf will be an indispensable complementary to existing profilers due to its effectiveness and low overhead. 
    more » « less
  4. null (Ed.)
    This paper studies a new convex variational model for denoising and deblurring images with multiplicative noise. Considering the statistical property of the multiplicative noise following Nakagami distribution, the denoising model consists of a data fidelity term, a quadratic penalty term, and a total variation regularization term. Here, the quadratic penalty term is mainly designed to guarantee the model to be strictly convex under a mild condition. Furthermore, the model is extended for the simultaneous denoising and deblurring case by introducing a blurring operator. We also study some mathematical properties of the proposed model. In addition, the model is solved by applying the primal-dual algorithm. The experimental results show that the proposed method is promising in restoring (blurred) images with multiplicative noise. 
    more » « less